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Anti-aliasing 

■ Practical Deferred MSAA 

■ Temporal Antialiasing: SMAAITX 

Camera Post-Processing 

■ Depth of Field 

■ Motion Blur 

Sharing results from ongoing research 

■ Results not used in a shipped game yet © 


Advances in Real-Time Rendering course. Siggraph 2013 


2 






ENGINE 3 c - 

ANT IALIASINGXdeferred msaa review 

The problem: Multiple passes + r/w from Multisampled RTs 

■ X ID.I introduced SV_Samplelndex / SV_Coverage system value semantics. 

■ Allows to solve via multipass for pixel/sample frequency passes [Thibierozll] 

SVSamplelndex 

■ Forces pixel shader execution for Each sub-sample and provides index of the sub-sample currently executed 

■ Index can be used to fetch sub-sample from a Multisampled RT. E.g. FoDMS.Laad( UnnDrmScreanCaord, nSamplelndex) 

SVCoverage 

■ Indicates to pixel shader which sub-samples covered during raster stage. 

■ Can modify also sub-sample coverage for custom coverage mask 

DX 11.0 Compute Tiled based deferred shading/lighting MSAA is simpler 

■ Loop through MSAA tagged sub-samples 
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ENGINE E C^'-TEI 

DEFERRED MSAAXheads up ! _ 

Simple theory, troublesome practice 

■ At least with complex deferred Tenderers 

Non-MSAA friendly code accumulates fast. 

■ Breaks regularly, as new techniques added without MSAA consideration 

■ Even if still works.. Very often you'll need to pinpoint and fix non-msaa friendly techniques, as these introduce visual 
artifacts. 

■ E.g. white/dark outlines, or no AA at all 

Do it upfront. Retrofitting a Tenderer to support Deferred MSAA is some work 

■ And it is very finiky 


Advances in Real-Time Rendering course. Siggraph ZDI3 


4 





CnX=NCINE 3 

DEFERRED MSAAXcustdm resdlve b per-sample mask 

Post G-Buffer, perform a custom msaa resolve 

■ Pre-rEsolvES sampls D, for pixel frBquEncy passBS such as lighting/othEr MSAA dependent passES 

■ In samB pass crEats sub-sample mask (comparE samplBS similarity, mark if mismatching) 

■ Avnid default SV_CDVERAGE, since it results in redundant processing on regions not requiring MSAA 
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DEFERRED MSAAXSTENCIL BATCHING [SDUSAI3] 

Batching per-samplE stencil mask with regular stencil buffer usage 

■ Reserve I bit from stencil buffer 

■ Update with sub-sample mask 

■ Tag entire pixel-quad instead of just single pixel -> improves stencil culling efficiency 

■ Make usage of stencil read/write bitmask to avoid per-sample bit override 
■ StencilWriteMask = 0x7F 

■ Restore whenever a stencil clear occurs 


Not possible due tD extreme stencil usage? 

■ Use clip/discard 

■ Extra overhead also from additional texture read for per-sample mask 


Advances in Real-Time Rendering course. Siggraph ZDI3 


E 





ENCINE 3 

DEFERRED MSAAXpixel and sample frequency passes 


Pixel Frequency Passes 

■ Set stencil read mask to reserved bits for pEr-pixEl regions (-0x80) 

■ Bind pre-resolved (non-multisampled) targets SRVs 

■ Render pass as usual 


Sample Frequency Passes 

■ Set stencil read mask to reserved bit for per-samp/s regions (0x80) 

■ Bind multisampled targets SRVs 

■ Index current sub-sample via SV SAMPLEINDEX 

■ Render pass as usual 
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DEFERRED MSAAXalpha test ssaa 



Alpha testing requires ad hoc solution 

■ Default SV CoveragE only applies ta triangle edges 

Create your own sub-sample coverage mask 

■ E.g. check if current sub-sample uses AT or not and set bit 


static const floatz vMSAAOffsets[2] = { float2 (0.25, 0.25), float2 (-0.25,-0.25) }; 
const float2 vDDX = ddx (vTexCoord.xy); 
const float2 vDDY = ddy (vTexCoord.xy); 

[unroll] for(int s = 0; s < nSampleCount; ++s) 

{ 

float2 vTexOffset = vMSAAOffsets[s].x * vDDX + (vMSAAOffsets[s].y * vDDY); 
float fAlpha = tex2D(DiffuseSmp, vTexCoord + vTexOffset).w; 
uCoverageMask |= ((fAlpha-fAlphaRef) >= 0)? (uint (0x1)<<i) : 0; 

} 
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DEFERRED MSAAXperformance shortcuts 


Deferred cascades sun shadow maps 

■ Render shadows as usual at pixel frequency 

■ Bilateral upscale during deferred shading composite pass 






ENGINE 3 

DEFERRED MSAAXperformance shortcuts m 
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ENGINE 3 

MSAAXperfdrmance shortcuts ® 


Many games, also doing: 

■ Skipping Alpha Test Super Sampling 

■ Use alpha tD coverage instead, or even no alpha test AA (let morphological AA tackle that) 

■ Render only opaque with MSAA 

■ Then render transparents withouth MSAA 

■ Assuming HDR rendering: note that tone mapping is implicitly done post-resolve resulting is loss of detail on high 
contrast regions 
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DEFERRED MSAAXmsaa friendliness 

Lank out for these: 

■ No MSAA noticeably working, or noticeable bright/dark silhouettes. 
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DEFERRED MSAAXmsaa friendliness 

Lank out for these: 

■ No MSAA noticeably working, or noticeable bright/dark silhouettes. 



< > 
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DEFERRED MSAAXr ecap_ 

Accessing and/or rendering to Multisampled RTs? 

■ Then you need to care about accessing and outputting correct sub-sample 

In general always strive to minimize BW 

■ Avoid vanilla deferred lighting 

■ Prefer fully deferred, hybrids. Dr just skip deferred altogether. 

■ If deferred, prefer thin g-buffers 

■ Each additional target on g-buffer incurs in export rate overhead [Thibierozll] 

■ NV/AMD (GCN): Export Cost = Cost(RTD)+Cost(RTI).... 

■ Fat formats are half rate sampling cost for bilinear filtering modes on GCN [ThibiErozlS] 

■ For lighting/some hdr post processes: 32 bit RIIGIIBIOF fmt suffices for most cases 
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ANTIALIASING + 4K RESDLUTIONSXwiLL WE NEED 


Likely can start getting creative here 
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ANTIALIASING \THE QUEST FDR BETTER (AND FAST) AA 


2011: the boom year of alternative AA modes (and naming combos) 

■ FXAA, MLAA, SMAA, SRAA, DEAA, GBAA, DLAA, ETC AA 

■ "Filtering Approaches for Real-Time Anti-Aliasing" [Jimenez et all II] 


Shading Anti-aliasing 

■ "Mip mapping normal maps" [ToksvigQ4] 

■ "Spectacular Specular: LEAN and CLEAN Specular Highlights" [Bakerll] 

■ "Rock-Solid Shading: Image Stability withouth Sacrificing Detail" [Hilll2] 
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TEMPORAL SSAAXsmaa 2TX/4X review 


[JIMENEZII][SOUSAII] 


Morphalagical AA + MSAA + Temporal SSAA combo 

■ Balanced cost/quality tradeoff, techniques complement each other. 
« Temporal component uses 2 sub-pixel buffers. 

■ Each frame adds a sub-pixel jitter for 2x SSAA. 

■ Reproject previous frame and blend between current and 
previous frames, via velocity length weighting. 

« Preserves image sharpness + reasonable temporal stability 
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TEMPORAL AAXcdmmdn ro b ustness flaws _ 

Relying Dn opaque geometry information 

■ Can't handle signal (color) changes nor transparency. 

« For correct result, all opaque geometry must output velocity 

Pathological cases 

■ Alpha blended surfaces (e.g. particles), lighting/shadow/reflections/uv animation/etc 

■ Any scatter and alike post processes, before the AA resolve 

Can result in distracting errors 

■ E.g. "ghosting" on transparency, lighting, shadows and such 

■ Silhouettes might appear, from scatter and alike post processes (e.g. bloom) 

Multi-GPU 

i si 

■ Simplest solution: force resource sync mcorre^ 

■ NVIDIA exposes driver hint to force sync resource, via NVAPI. This is solution used by NVIDIAs TXAA 

■Note to hw vendors: would be great if all vendors exposed such (even better if Multi-GPU API functionality generalized) 


< > 
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SMAA I TXXa more rd b ust temporal aa _ 

Concept: Only track signal changes, don't rely on geometry information 

■ For higher temporal stability: accumulate multiple frames in an accumulation buffer, alike TXAA [Lottesl2] 

■ Re-project accumulation buffer 

■ Weighting: Map acc. buffer colors into the range of curr. frame neighborhood color extents [Malan20l2]; different weight 
for hi/low frequency regions (for sharpness preservation). 



Current Frame (tO) Accumulation Buffer (tN) 
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SMAA ITX\a more robust temporal aa © 



Concept: Only track signal changes, don't rely on geometry information 


■ For higher temporal stability: accumulate multiple frames in an accumulation buffer, alike TXAA [Lottesl2] 

■ Re-project accumulation buffer 

■ Weighting: Map acc. buffer colors into the range of curr. frame neighborhood color extents [Malan20l2]; different weight 
for hi/low frequency regions (for sharpness preservation). 
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ENGINE 3 

SMAA I TX\a more rd b ust temporal aaco 

Sample code 



float3 cM = tex2D(texO, tc.xy); 
float3 cAcc = tex2D(texO, reproj_tc.xy); 

float3 cTL = tex2D(texO, tcO.xy); 
float3 cTR = tex2D(texO, tcO.zw); 
float3 cBL = tex2D(texO, tcl.xy); 
float3 cBR = tex2D(texO, tel.zw); 

float3 cMax = max (cTL, max (cTR, max (cBL, cBR))); 
float3 cMin = min (cTL, min (cTR, min(cBL, cBR))); 

float3 wk = abs ((cTL+cTR+cBL+cBR)*0.25-cM); 

return lerp(cM, clamp (cAcc, cMin, cMax), saturate (rep(lerp (kl, kh, wk))); 
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DEPTH DF FIELD 





ENGINE E c^- 

DEP TH DFF I ELDXplausible DDF: parameterization_ 

Artist friendly parameters is one reason why games DOF tends to look wrong 

■ Typical controls such as "focus range" + "blur amount" and others have not much physical meaning 

■ CoC depends mainly on f-stops, focal length and focal distance. These last 2 directly affect FOV. 

■ If you want more Bokeh, you need to max your focal length + widen aperture. This means also getting closer or further 
from subject for proper framing. 

■Not the typical way a gama artist/programmer thinks about DDF. 
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DEPTH OF FIELDXfocal length 



Advances in Real-Time 
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DEPTH OF FIELD\f-stdps 
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DEPTH DF FIELD\f-stdps o 




2 f-stDps 2.8 f-stDps 4 f-stops 5.B f-stDps 
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DEPTH OF FIELDXfocal distance 
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DEPTH DF FIELDXfdcal 



D.75 m 



in Real-Time Rendering course. Siggraph 2013 
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DEPTH OF FIELDXfocal distance 
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DEPTH OF FIELDXplausible DDF: bokeh 



Dut of focus region is commonly referred in photography as "Bokeh" (Japanese word for blur) 
Bokeh shape has direct relation to camera aperture size (aka f-stops) and diaphragm blades count 

■ Bigger aperture = more "circular" bokeh, smaller aperture = more polygonal bokeh 

■Polygonal bokeh look depends on diaphragm blades count 
■Blades count varies on lens characteristics 

■ Bigger aperture = more light enters, smaller aperture = less light 

■Dn night shots, you might notice often more circular bokeh and more motion blur 

Bokeh kernel is flat 

■ Almost same amount of light enters camera iris from all directions 

•Edges might be in shednw, this is commonly known as Vignetting 
■Poor lenses manufacturing may introduce a vast array of optical aberrations [WikiOE 



This is main reason why gaussian blur, diffusion dof, and derivative techniques look wrong/visually unpleasant 
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ENCINE 3 

DEP TH DF F I ELDXstate df the art ov e rview 

Scatter based eE li ues yrilOS! awadal ]| [3DMarkll] 'Mittrin Sousall] 

• Render I quad er tri per-pixel, scale based on CoC 





Simple implementation and nice results. Downside: performance, particularly on shallow DDF 

■Variable/inconsistent fillrate hit, depending on near/far layers resolution and aperture size might reach >5 ms 
■Quad generation phase has fixed cost attached. 



Advances in Real-Time Rendering course. Siggraph 2DI3 


31 





3 


DEP TH DF F I ELDXstate df the art ov e rview m 

Gather based: separable (inflexible kernel) vs. kernel flexibility 





[Whitell] [Andrsev 12] [Macintnshl2] 
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DEPTH DF FIELDXa PLAUSIBLE AND EFFICIENT DOF reconstruction filter 


Separable flexible filter: low bandwidth requirement + different bokeh shape possible 


■ 1st pass N A 2 taps (e.g: 7x7). 

■ 2nd pass N A 2 taps (e.g: 3x3) for flood filling shape 

■ RIIGIIBID : downscaled HDR scene; R3G3: CoC 

■ Done at half resolution 

■ Far/Near fields processed in same pass 

■ Limit offset range to minimize undersampling 

■ Higher specs hw can have higher tap count 

Diaphragm and optical aberrations sim 
Physically based CoC 
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DEPTH DF FIELDXlens review 



Pinhole "Lens" 

■ A camera withouth lens 

■ Light has to pass through single small aperture before hitting image plane 

■ Tipical realtime rendering 


Thin lens 



■ Camera lenses have finite dimension 

■ Light refracts through lens until hitting image plane. 

■ F = Focal lenght 

■ P = Plane in focus 

■ I = Image distance 


P 



Image Plane 


Circle of 
Confusion 
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DEPTH OF FIELDXlens review (2) 



The thin lens equation gives relation between: 

■ F = Focal length (where light starts getting in focus) 

■ P = Plane in focus (camera focal distance) 

• I = Image distance (where image is projected in focus) 

Circle of Confusion [Potmesiisi] 

■ f = f-stops (aka as the f-number Dr focal ratio) 

■ D = Dbject distance 

■ A = Aperture diameter 

Simplifies to: 

■ Note: f and F are known variables from camera setup 

• Folds down into a single mad in shader 

Camera FOV: 

■ Typical film formats (or sensor), 35mm/70mm 

■ Can alternatively derive focal length from FDV 
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DEPTH DF FIELDXsampling 



ConcEntric Mapping [Shiriey37] used for uniform sample distribution 

• Maps unit square ta unit circle 

■ Square mapped to (a,b) [-I.I] 2 and divided into 4 regions by lines a=b, a=-b 

■ First region given by: 

e = ?L± 

4-a 

Diaphragm simulation by morphing samples to n-gons 

• Via a modified equation for the regular polygon. 
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DEPTH OF FIELDXsamplinG: 2 nd iteration 
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DEPTH DF FIELDXseparable filter 



st iteration: 49 taps (0.42Gms) 
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DEPTH OF FIELDXreference vs sei 
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ENCINE 3 

DEPTH OF FIELDXdiaphragm simulation in actidn 



2f-stDps 4f-stops 
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DEPTH OF FIELDXtile m/m coc 



Tile Min/Max CoC 

■ Downscale CoC target k times (k = tile count) 

■ Take min fragment for far field, max fragment for near field 

■ R8G8 storage 

Used to process near/far fields in same pass 

■ Dynamic branching using Tile Min/Max CoC for both fields 

■ Balances cost between far/near 

■ Also used for scatter as gather approximation for near field 

Can fold cost with other post-processes 

■ Initial downscale cost folded with HDR scene downscale for bloom, 
also pack near/far fields HDR input into RIIGIIBIOF - all in I pass 



Tile max-CoC (near field) 
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DEPTH DF FIELDXfar+near field processing 



Both fields use half resolution input 

■ Careful: downscale is source of error due to bilinear filtering 

■ Use custom bilinear (bilateral) filter for downscaling 

Far Field 

■ Scale kernel size and weight samples with far CoC [Scheumerman05] 

■ Pre-multiply layer with far CoC [Gotanda09] 

■Prevents bleeding artifacts from bilinear/separable filter 





No weighting CdC weighting 


CdC weighting + 
CdC pre-multiply 
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DEPTH OF FIELDXfar + near field processing 


Both fields use half resolution input 

■ Careful: downscale is source of error due to bilinear filtering 

■ Use custom bilinear (bilateral) filter for downscaling 

Far Field 

■ Scale kernel size and weight samples with far CoC [Scheumerman05] 

■ Pre-multiply layer with far CoC [Gotanda09] 

■Prevents bleeding artifacts from bilinear/separable filter 

Near Field 

■ Scatter as gather aproximation 

■ Scale kernel size + weight fragments with Tile Max CoC against 
near CoC 

■ Pre-multiply with near CoC 

•Dnly want to blur near field fragments (cheap partial occlusion approximation) 



Advances in Real-Time Rendering course. Siggraph 2DI3 


43 




y 3 


DEPTH OF FIELDXfinal composite 


Far field: upscale via bilateral filter 

■ Take 4 taps from half res CqC, compare against full res CoC 

■ Weighted using bicubic filtering for quality [Sigg05] 

■ Far field CoC used for blending 

Near field: upscale carelessly 

■ Half resolution near field CoC used for blending 

■ Can bleed as much as possible 

■ Also using bicubic filtering 

Carefull with blending 

■ Linear blending doesn’t look good (signal frequency soup) 

•Can be seen in many games, including all Crysis series (puts hat of shame) 

■ Simple to address: use non-linear blend factor instead. 






Linear blend 



Non-linear blend (better) 
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MOTION BLOR 
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MOTI ON BLURXshutt er speed and f-stops review_ 

Amount of motion blur is relative to camera shutter speed and f-stops usage 

■ The longer the exposure (slower shutter), the more light received (and the bigger amount of motion blur), and vice-versa 

■ The lower f-stops the faster the exposure can be (and have less motion blur), and vice versa 




2 f-stops, shutter I/2D sec 



4 f-stops, shutter l/G sec 
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MOTION BLURXstate df the art overview 


Scatter via geometry expansion [Green03][Sawada07] 

■ Require additional geometry pass + gs shader usage * 



■ NRfndprGl 





Velocity Map 


Geometry Shader 
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MOTION BLURXstate df the art overview ra 


Scatter as gather ’Sousa08][Gotanda09][Sousall][Maguirel2] 

■ E.g.velacity dilation, velocity blur, tile max velocity; single vs. multiple pass composite; depth/v/obj ID masking; single pass DOF+MB 
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MDT I DN BLURXRECDNSTRUCTIDN FILTER FDR P LAUSI B LE MB [mcguirei?] 

Tile Max Velacity and Tile Neighbor Max Velocity 

■ Downscale Velocity buffer by k times (k is tile count) 

« Take max length velocity at each step 


Velocity V: iv x h 


Color C: w x h 


Depth Z: w x h 


Motion Blur Pass 

■ Tile Neighbor Max for early out 

« Tile Max Velocity as center velocity tap 

■ At each iteration step weight against full 
resolution ||V|| and Depth 
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Cn^NCINs 3 C^PTEI" 

MDTI DN BLURXan improv ed re construction filte r_ 

Performant Quality 

■ Simplify and vectorize inner loop weight computation (ends up in couple mads) 

■ Fat buffers sampling are half rate on GCN hw with bilinear (point filtering is fullrate, but doesn't look good due to aliasing) 

■ Inputs: RIIGIIBIOF for scene, bake ||V|| and 8 bit depth into a R8G8 target 

■ Make it separable. 2 passes [SousaD8] 


\\\\ 


\\ 


^\\\\\ 



I iteration: G taps (0.23G ms) 


2nd iter: G taps (+0.23G ms; 3G tapsaE 
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MOTION BLURXan improved reconstruction filter 


( 2 ) 


Inner loop sample 


const float2 tc = min_tc + blur_step * s; 

const float lensq_xy = abs (min_len_xy + len_xy_step * s); 

const float2 vy = tex2Dlod (texl, float4 (tc.xy, 0, 0)); // x =||v||, y=depth 

const float2 cmp_z = DepthCmp( float2 (vx.y, vy.y), float2(vy.y, vx.y), 1); 

const float4 cmp_v = VelCmp(lensq_xy, float2(vy.x, lensq_vx)); 
const float w = (dot (cmp_z.xy, cmp_v.xy) + (cmp_v.z * cmp_v.w) * 2); 

acc.rgb += tex2Dlod (texO, float4(tc.xy, 0, 0)) * w; 

wacc += w; 


float2 DepthCmp( float2 zO, float2 zl, float2 fSoftZ) { 
return saturate ( (l.Of + zO* fSoftZ) - zl* fSoftZ ); 

} 

float4 VelCmp (float lensq_xy, float2 vxy) { 

return saturate ((1.Of - lensq_xy.xxxx *rcp(vxy.xyxy)) + float4(0.0f, O.Of, 0.95f, 0.95f)); 

} 
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MOTION BLURXan improved reconstruction filter ® 


Dutput object velocity in G-Buffer (only when required) 

■ Avoids separate geometry passes. 

■ Rigid geometry: object distance < distance threshold 

■ Deformable geometry: if amount of movement > movement threshold 

■ Moving geometry rendered last 

■ R8G8fmt 

p . . . . Object velocity 

Lomposite with camera velocity 

■ Velocity encoded in gamma 2.0 space 

■ Precision still insufficient, hut not much noticeable in practice 

Encode v e, K = J v .xy *sgn(v iy )*(127.0/255.0) + 0.5 


Decode 


v e, K = ( v e„c -127.0/255.0)/255.0 
v =(v * v )*sgn(v ) 

V enc enc / & V enc / 


Composite with camera velocity 
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MOTION BLUR\MB DR DDF FIRST? 



In real world MB/DDF occur simultaneously 

■ A dream implementation: big N A 2 kernel + batched DOF/MB 
* Or sprite based with MB quad stretching 

■ Full resolution! I Billion taps! FPIG! Multiple layers! © 




But... performance still matters (consoles): 

■ DOF before MB introduces less error when MB happening on focus _ 

■This is due MB is a scatter as gather op relying on geometry data. — 

■Any other similar op after will introduce error. And vice-versa. _ 

■Error from MB after DOF is less noticeable. 

■ Order swap makes DOF harder to fold with other posts ^ 

■Additional overhead 


MB before DDF 





MB after DDF 




i 
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FINAL REMARKS _ 

Practical MSAA details 

■ Dd's and Dont's 

SMAAITX: A More Robust Temporal AA 

■ For just 4 extra texture Dps and couple alu 

A Plausible and Performant DOF Reconstruction Filter 

■ Separable flexible filter, any bokeh kernel shape doable 

■ 1st pass: 0.426ms, 2nd pass: 0.064ms. Sum: 0.52ms for reconstruction filter * 

An Improved Reconstruction Filter for Plausible Motion Blur 

■ Separable, 1st pass: 0.236 ms, 2nd pass: 0.236ms. Sum: 0.472ms for reconstruction filter * 
*l080p +AMD 7370 
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WE ARE HIRING! 




Become a hero 

Crytek is always looking for the best talent. If you have a passion for 
games and want to share the excitement and enthusiasm that we feel 
for creating great games, find your future job now 

http://www.crytek.com/career/offers/overview 
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QUESTIONS ? 



TiagDHCrytEk.Com / Twitter: Crytek_Tiago 
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